Combining Strategies for Extracting Relations from Text Collections

نویسندگان

  • Eugene Agichtein
  • Eleazar Eskin
  • Luis Gravano
چکیده

Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering precise queries or for running data mining tasks. Our Snowball system extracts these relations from document collections starting with only a handful of user-provided example tuples. Based on these tuples, Snowball generates patterns that are used, in turn, to find more tuples. In this paper we introduce a new pattern and tuple generation scheme for Snowball, with different strengths and weaknesses than those of our original system. We also show preliminary results on how we can combine the two versions of Snowball to extract tuples more accurately.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Facilitating Image Exploratory Search with Relations

Traditional text-based image retrieval does not fully support queries on semantic relationships between two entities. To better help the exploratory search on image collections, this paper presents a system for automatically extracting the relations between entities by analyzing the sentence dependency on the descriptions of the images. Our results demonstrate that using the extracted relations...

متن کامل

KELVIN: Extracting Knowledge from Large Text Collections

We describe the KELVIN system for extracting entities and relations from large text collections and its use in the TAC Knowledge Base Population Cold Start task run by the U.S. National Institute of Standards and Technology. The Cold Start task starts with an empty knowledge base defined by an ontology of entity types, properties and relations. Evaluations in 2012 and 2013 were done using a col...

متن کامل

A hybrid approach for extracting semantic relations from texts

We present an approach for extracting relations from texts that exploits linguistic and empirical strategies, by means of a pipeline method involving a parser, partof-speech tagger, named entity recognition system, pattern-based classification and word sense disambiguation models, and resources such as ontology, knowledge base and lexical databases. The relations extracted can be used for vario...

متن کامل

Inducing hyperlinking rules in text collections

Automatic hyperlinking methods based on Information Extraction techniques and on linking rules firing on salient facts have been proposed to connect documents with “typed” relations. However, the activity of defining link types and writing linking rules may be cumbersome due to the large number of possibilities. In this paper, we tackle this issue proposing a model for automatically extracting ...

متن کامل

Text-based Knowledge Acquisition for Ontology Engineering

This paper describes an approach towards ontology engineering that makes use of text technology for extracting relevant semantic relations from document collections. A short description of corpus characteristics and examples of statistical text analysis results show how input for ontology design can be generated automatically. The Topic Map standard is used as an example for standardised repres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000